NVIDIA Launches Orchestrator-8B: An AI Model Selector
Discover NVIDIA's Orchestrator-8B, enhancing tool selection using reinforcement learning.
Records found: 5
Discover NVIDIA's Orchestrator-8B, enhancing tool selection using reinforcement learning.
Sakana AI introduces Reinforcement-Learned Teachers (RLTs), a novel method that trains smaller models to teach reasoning to large language models efficiently using reinforcement learning focused on generating step-by-step explanations.
Microsoft and Tsinghua researchers propose Reward Reasoning Models that adaptively allocate compute resources during evaluation, significantly improving large language model judgment and alignment across complex tasks.
Tsinghua University researchers developed the Absolute Zero paradigm to train large language models without external data, using a self-evolving code executor system to enhance AI reasoning and learning.
Researchers from Tsinghua University and Shanghai AI Lab introduce TTRL, a novel method allowing large language models to improve their performance without labeled data by leveraging self-generated pseudo-rewards during inference.